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Abstract 

This paper describes a comprehensive assessment of a new high-resolution, high-quality 
gauge-satellite based analysis of daily precipitation over continental South America 
during 2004. This methodology is based on a combination of additive and multiplicative 
bias correction schemes in order to get the lowest bias when compared with the observed 
values. Inter-comparisons and cross-validations tests have been carried out for the control 
algorithm (TMPA real-time algorithm) and different merging schemes: additive bias 
correction (ADD), ratio bias correction (RAT) and TMPA research version, for different 
months belonging to different seasons and for different network densities. All compared 
merging schemes produce better results than the control algorithm, but when finer 
temporal (daily) and spatial scale (regional networks) gauge datasets is included in the 
analysis, the improvement is remarkable. The Combined Scheme (CoSch) presents 
consistently the best performance among the five techniques. This is also true when a 
degraded daily gauge network is used instead of full dataset. This technique appears a 
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suitable tool to produce real-time, high-resolution, high-quality gauge-satellite based 
analyses of daily precipitation over land in regional domains. 



1 . Introduction 


The spatial and temporal distribution of precipitation around the globe is needed for a 
variety of scientific uses such as climate diagnostic studies, and societal applications such 
as water management for agriculture and power, drought relief, flood control and flood 
forecasting (Arkin and Xie, 1994). The task of quantifying the distribution is complicated 
by the fact that no single currently available estimate of precipitation has the necessary 
coverage and accuracy over the whole globe. While a suite of sensors flying on a variety 
of satellites have been used to estimate precipitation on a global basis, generally 
speaking, the perfonnance of satellite precipitation estimates over land areas is highly 
dependent on the rainfall regime and the temporal and spatial scale of the retrievals 
(Ebert et al, 2007). On the other hand, gauge observations continue to play a critical role 
in observations systems over global land areas. In addition, gauge observations are the 
only source that is obtained through direct measurements. Both the radar and satellite 
estimates are indirect in nature and need to be calibrated or verified using the gauge 
observations (Xie and Arkin 1995; Ebert et al, 2007). While it is possible to create 
rainfall estimates using a combination of different satellite data (i.e. CMORPH; Joyce et 
al. 2004), researchers have increasingly moved to using ‘the best of both worlds’ in order 
to improve accuracy, coverage, and resolution. The first such combinations were 
performed at a relatively coarse scale to ensure reasonable error characteristics. For 
example, the Global Precipitation Climatology Project (GPCP) satellite-gauge (SG) 
combination is computed on a monthly 2.5° x 2.5° latitude-longitude grid (Huffman et al. 
1997; Adler et al. 2003), while finer-scale products initiated by the GPCP include the 



Pentad (Xie et al. 2003) and One-Degree Daily (Huffman et al. 2001) combination 
estimates of precipitation. 

The GPCP combination method is designed to use the strengths of each input dataset to 
produce merged global, monthly precipitation fields that are superior to any of the 
individual datasets. The technique is also designed to reduce bias in each step by using 
the input original or intermediate product with the presumed smallest or zero bias to 
adjust the bias of other products. A large-scale (5x5 grid box) average of the 
multisatellite analysis is adjusted to agree with the large scale average of the gauges (over 
land and where available). This keeps the bias of the satellite and gauge combination 
close to the (presumably small) bias of the gauge analysis on a regional scale. Finally, the 
gauge adjusted, multisatellite estimate and the gauge analysis are combined with inverse- 
error-variance weighting to produce the final, merged analysis. This gauge-satellite 
combination approach allows the multisatellite estimate to provide important local 
variations in gauge-sparse areas, while still retaining the overall gauge bias (Adler et al, 
2003). In this case, the monthly gauge analysis is performed by the Global Precipitation 
Climatology Centre (GPCC). This gauge data is analyzed using an empirical interpolation 
method SPHEREMAP (Willmott et al., 1985) which is routinely used at the GPCC since 
1991 for the calculation of grid point results of 0.5° lat./long. 

The One-Degree Daily methodology uses GPCP retrievals by scaling the short-period 
estimates to sum to a monthly estimate that includes monthly gauge data (Huffman et al. 
2001). A similar approach is used in Huffman et al. 2007 (H07, hereafter) to scale 3- 
hourly estimates using the real-time TMPA (TRMM Multisatellite Precipitation 
Analysis), where all available 3-hourly merged estimates are summed over a calendar 



month to create a monthly multisatellite (MS) product. The MS and gauge are combined 
as in Huffman et al. (1997) to create a post real-time monthly SG combination. Then, the 
field of SG/MS ratios is computed on the 0.25° x 0.25° grid (with controls) and applied to 
scale each 3-hourly field in the month, producing the research version, and also called 
version-6 of 3B42 product (3B42V6 hereafter). 

Among multiple applications, precipitation at fine resolution along with increasing 
computational capacity allows operational and research studies in hydrology across 
different temporal and spatial scales. However interaction between different scales to 
resolve land surface hydrology and atmospheric dynamics still needs progress (Tao et. al, 
2003) and well balanced high resolution precipitation datasets play an essential role in 
such land-atmosphere interactions. One of the motivations for this paper is the potential 
use of high resolution atmospheric datasets for land surface hydrology studies and 
numerical modeling over South America by combining surface observations with 
remotely sensed information. Such data fusion was made possible by the onset of Land 
Data Assimilation Systems (LDAS) initiatives (Mitchell et al.,2004; Rodell et al., 2004). 
A South American LDAS (SALDAS - de Goncalves et al., 2006a, b ) is particularly 
challenging when proposing to combine high resolution remote sensing and surface 
observations using LSM’s over a continent with sparse observation networks. 
Precipitation (along with radiation) represents one of the most important drivers for 
LSM’s and motivates this paper as part of the efforts of combining satellite precipitation 
with raingauges for SALDAS forcing composition and evaluation (de Goncalves et al., 


2007). 



This paper describes and evaluates a new methodology to merge rainfall satellite 
estimations and daily gauge data. In this case, real-time TMPA (where no rain gauges are 
incorporated) is used as high quality rainfall algorithm (H07), while CPTEC daily rain 
gauge database is used to correct the bias on daily basis over South America. Section 2 
describes the dataset used in this paper while in section 3 the merging methodology is 
presented. The experimental design and the validation scheme are discussed in section 4. 
The results and the conclusions are presented in section 5. 

2. Datasets 

a. Real-time TRMM Multisatellite Precipitation Analysis 

This algorithm is fully described in H07. The main features of this algorithm, including 
the real-time adjustment, will be outlined in this section. The first stage of the algorithm 
consists in calibrate and combine microwave precipitation estimates. Passive microwave 
fields of view (FOVs) from TMI, AMSR-E, and SSM/I are converted to precipitation 
estimates at the TRMM Science Data and Information System (TSDIS) with sensor- 
specific versions of the Goddard Profiling Algorithm (GPROF; Kummerow et al. 1996; 
Olson et al. 1999), while AMSU-B are converted to precipitation estimates at the 
National Environmental Satellite, Data, and Infonnation Service (NESDIS) with the 
operational version of the Weng et al. (2003) algorithm. In the case of real-time version, 
the calibration is made using TMI estimates from TRMM because TCI (TMI-PR 
estimations) are not available. Also in this version, the calibration coefficients are 
performed using the last 6 available pentads (5 -day period). 

In a second step, the infrared precipitation estimates are created using the calibrated 
microwave precipitation. Histograms of time-space matched combined microwave (high 



quality precipitation rates) and IR Tb’s, each represented on the same 3-hourly 0.25° x 
0.25° grid, are accumulated for a five trailing and one current (partial) pentad (for real- 
time version) into histograms on a 1° x 1° grid, aggregated to overlapping 3° x 3° 
windows, and then used to create spatially varying calibration coefficients that convert IR 
Tb’s to precipitation rates. In the final stage, the microwave and IR estimates are 
combined. The physically based combined microwave estimates are taken “as is” where 
available, and the remaining grid boxes are filled with microwave-calibrated IR 
estimates. A detailed description of this algorithm can be found in H07. The daily 
accumulation is obtained summing the individual 3-hour files from 15:00Z of the 
previous day (12:00Z - 15:00Z period) to 12:00Z (09:00Z - 12:00Z period) of the current 
day. 

b. Rain gauge database 

The data sources for the daily surface precipitation observations used, in addition to those 
of the World Meteorological Organization (WMO), are obtained through an INPE 
compilation of the following agencies: (a) Agencia Nacional de Energia Electrica 
(ANEEL; National Agency for Electrical Energy), (b) Agencia Nacional de Aguas 
(ANA; National Water Agency), (c) Funda£ao Cearense de Meteorologia e Recursos 
Hidricos (FUNCEME; Meteorology and Hydrologic Resources Foundation of Ceara), (d) 
Superintendence do Desenvolvimento do Nordeste (SUDENE; Superintendence for 
Development of the Northeast), (e) Departamento de Aguas e Energia Eletrica do Estado 
de Sao Paulo (DAEE; Department of Water and Electrical Energy for the State of Sao 
Paulo), in collaboration with the Centro de Previsao de Tempo e Estudos Climaticos 
(CPTEC; Brazilian Weather Forecast and Climate Studies Center), and (f) Technological 



Institute of Parana (SIMEPAR). In all cases, the accumulation time is from 12:00Z of the 
previous day to 12:00Z of the current day. It’s important to point out that with the 
addition of observations from local South American agencies and Brazilian automated 
weather stations; the number of observations in the CPTEC/INPE database is somewhat 4 
times larger than GPCC datasets. This addition will help to retain the overall gauge bias 
in finer scales. 

3. Merging Methodology 

The detennination of the methodology for constructing a merging technique for daily 
rainfall estimates over land using a satellite-based algorithm and rain gauge network 
involves three major issues: 1) define the algorithm/s to be used in the merging process; 
2) design the merging technique and; 3) define an validation strategy to asses the results. 

In the first case, the Experimental Real-Time daily TRMM Multi-Satellite Precipitation 
Analysis (TMPA-RT) (H07) is used as the base algorithm for retrieving precipitation 
because TMPA is successful at approximately reproducing the surface observation-based 
histogram of precipitation, as well as reasonably detecting large daily events (control 
algorithm). 

In the second issue, two options are suitable to merge rain gauges and satellite retrievals: 
interpolate the observed values and then merge both fields or define a bias against 
satellite retrievals and then apply some interpolation technique to get the merged field. 
Some studies argued that anomaly schemes (additive and multiplicative) are suitable to 
remove the bias of satellite estimations because some of the spatial variation of total 
precipitation is associated with finer scales processes (topography, local circulation, etc.) 
and is thus steady (Dai et al, 1997). In this case, additive and multiplicative correction 



schemes are used to remove the bias of satellite retrievals. While the first scheme, suggest 
that the ratio between the observed and estimated value is suitable to remove the bias of 
satellite retrievals on daily basis, this methodology is not useful to determine the 
magnitude of the precipitation when the retrieved satellite value is zero and the observed 
value is different from zero (i.e., warm clouds and/or clouds with no ice structure). On 
the other hand, additive correction scheme produces large differences when exist large 
discrepancies between the observed and estimated values. 

The proposed scheme, hereafter called Combined Scheme (CoSch), combines these two 
approaches in a single method to remove the bias of satellite estimations. 

The additive bias correction (ADD) is defined as follows: 

rr+ = rr sat + ( rr obs ~ rr sat ) O) 

Where rr sat is the satellite based estimation and rr 0 b s is the accumulated 24-hours rainfall. 
The bias between the observed rainfall and satellite retrievals is represented by the 

second term (rr obs - rr sat ) . This bias is interpolated using the inverse distance weight 
algorithm. 

The ratio bias correction (RAT) is performed according to the following equation: 

YV 

rr * - rr sat * ( — ) (2) 

YY 

' ' sat 

Where the same conventions than in additive bias correction (Equation 1) were used. 

After this procedure, rain gauge observations are interpolated using the nearest neighbor 
method (no explicit interpolation, original values are retained) masking out all regions 



with a distance greater than 5 grid points from the closer station. In this case, the grid size 
is 0.25 degrees to match with satellite estimation. 

In the last stage, the bias-corrected rainfall is defined as follows: satellite estimation 
remains with no correction in those areas masked out in the previous procedure, while the 
bias-corrected rainfall is defined as the value of such correction scheme (ratio or additive) 
whose difference with the closest observed value (defined in the previous step) is 
minimum on pixel by pixel basis. 

In order to evaluate the decision rule described in the previous paragraph, January 2004 
(southern hemisphere summer) was selected in order to know if any correction scheme 
shows some preference over the other, or if any geographical region tends to use more 
one scheme than other. Figure 1 shows the percentage of pixels, for a given day, that 
certain correction scheme was selected. The result shows a pretty steady situation along 
the month where around 54% of the pixels are corrected according to ADD scheme while 
only 46% of the time RAT technique is chosen. Considering these mean values for both 
schemes, the difference between the number of days (in %) that a given scheme was 
chosen and the mean value for January 2004 was calculated for both bias-removal 
techniques (ADD and RAT). The spatial distribution of RAT relative bias is presented in 
Figure 2a. Due to construction constrains, the sum of RAT and ADD (not shown) is equal 
0. It can be observed that over southern South America (approximately southward 20S), 
RAT scheme is selected 20% above the average (46%, in this case), while this behavior is 
opposite over most of part of the Brazilian territory and Bolivia. The largest deviations 
from the average are observed along the coast of Chile, Peru, Colombia, Venezuela and 
the Guyanas. Those regions exhibit the scarcest gauge networks in the region. One 



hypothesis about this behavior is that the selection of a given scheme is related with the 
precipitation regime. Figure 2b shows a close agreement between the RAT bias and the 
accumulated monthly rainfall: larger values of rainfall are associated with 

negative/(positive) values of RAT/(ADD), while RAT/(ADD) is more/(less) frequently 
chosen in those regions with less rainfall. Other factors such as circulation and gauge 
density also should influence these results. 

The third issue, about the validation strategy, will be described in the next section. 

4. Validation Strategy and Experimental Design 

For testing this bias-removal technique, a daily rain gauge dataset for South America 
during 2004 (see section 2b for more details) were used in two ways in a cross- 
correlation process. Gauge reports at 10%, randomly selected of the stations were 
withdrawn and those at the remaining 90% of the stations were used in the bias-removal 
process. This cross correlation process was conducted 10 times so that each gauge was 
withdrawn once. The corrected rainfall estimation was then compared with the 
corresponding observation to examine the perfonnance of the proposed technique (Chen 
et al. 2002). For comparison purposes four other estimations were included in this study: 
additive bias-removal and ratio bias-removal (as defined in the previous section) and the 
research and real-time version of TMPA (3B42V6 and 3B42RT). For the first two 
correction schemes (ADD and RAT), the same cross correlation process is performed, 
while in the last case, values of 3B42V6 and 3B42RT (control run) were selected and 
compared for the same validation dataset (10% of rain gauges randomly selected, 
conducted 10 times) to make all the statistical results comparable among them. Table 1 
shows the monthly mean (calculated on daily basis) of bias (in mm), root mean square 



error (in mm) and correlation coefficient for the five proposed models for January, April, 
July and October 2004. Bold values are the better result obtained for a particular month 
and for each statistical parameter. In this case, it can be shown that CoSch has a better 
performance than ADD and RAT separately, but also has a better performance than 
3B42V6. This situation is highly remarkable when RMSE and CORR are compared 
among different estimations. Among these five different estimations, the worse 
performance is for 3B42RT (control algorithm), where no rain gauge information is 
added. This result shows that the combined scheme (CoSch) adds some extra value to the 
ADD and RAT when used separately, retaining some local spatial variability on daily 
rainfall. 

In 2003 the International Precipitation Working Group (1PWG) began a project to 
validate and intercompare satellite rainfall estimates (Ebert et al, 2007). Some categorical 
statistics such as bias score (BIAS), probability of detection (POD), false alarm ratio 
(FAR) and Equitable Threat Score (ETS) can be computed for different rain rate 
thresholds as follows: 1, 2, 5, 10, 20 and 50 mm. All these parameters can be computed 
from a rain / no-rain contingency table and measure the performance of a given algorithm 
(see Wilks, 1995 for more details).. Figure 1 shows the annual mean of the 
aforementioned categorical statistics (based on daily estimates) for 3B42RT, 32B42V6 
and CoSch for all rainfall thresholds, except 50 mm because the lack of events above that 
threshold can affect the robustness of the statistics. It can be shown that the performance 
of CoSch is better for all rainfall thresholds. POD (Figure la) is higher for all thresholds 
suggesting that CoSch can get more correct estimates in each category, while FAR 
(Figure lb) is smaller for all categories, suggesting that the amount of false alarms 



estimated by CoSch is smaller than other estimates. Bias Score (BIAS - Figure lc) shows 
similar values for all estimations (close to one, the ideal value). Nevertheless, CoSch 
tends to overestimate lower values and underestimate the largest ones. ETS (Figure Id) 
measures the fraction of observed and/or estimated events that were correctly estimated, 
adjusted for hits associated with random chance (for example, it is easier to correctly 
forecast rain occurrence in a wet climate than in a dry climate). This parameter is 
sensitive to hits, because it penalizes both misses and false alarms in the same way. In 
this case, the improvement is clear for all rainfall thresholds when compared with 
3B42RT and 3BR2V6. 

To further quantify the impact of the gauge network density to the accuracy of all 
different estimations, cross-validation tests were conducted using just only 10% 
(ramdomly selected) of available data to perform the additive, ratio and the combined 
scheme. Other 10 % (excluding those chosen to perform the correction) was used to 
validate the results of the aforementioned schemes and also 3B42RT and 3B42V6. This 
experiment was carried out 10 times so that each gauge was withdrawn once and both 
results (using 90% of the gauges in one case and using just only 10% to perform the 
correction in a second experiment) are statistically comparables. This analysis gives us 
the opportunity to examine the impacts of varying gauge density to the quantitative 
accuracy of the methodology. Table 2 shows the same statistics parameters than Table 1 
but, in this case, with just only 10% of the gauges have been used to perfonn the bias- 
removal process. As expected, perfonnance of these methodologies (CoSch, ADD and 
RAT) improves with increasing density of gauge network, while the other estimates 
(3B42RT and 3B42V6) show approximately the same values, because the number of 



gauges used to validate remains the same (10% randomly selected for each one of 10 
experiments). Nevertheless, it’s important to point out that despite the fact of using 10% 
of available gauges to compute the bias-removal process, the technique show better 
results than 3B42V6 (that uses GPCC data to perform a bias removal process as 
explained in section 2) and 3B42RT. On the other hand, CoSch perform better than ADD 
and RAT separately. Nevertheless, it’s also important to point out that; in this case, the 
difference between CoSch and RAT and between CoSch and ADD is closer than in the 
previous analysis, suggesting that, for very sparse rain gauge networks, the added value 
of the combination is less effective than in the previous case. A similar situation can be 
observed with the rest of the categorical statistics (Figure 2). The performance of CoSch 
is better for all rainfall thresholds, but the difference, as expected, is smaller than the 
previous analysis. POD (Figure 2a) is higher for all thresholds suggesting that, despite of 
the fact that the network density is very scarce, CoSch can get more correct estimates in 
each category, while FAR (Figure 2b) is smaller, suggesting that less false alarms are 
estimated by CoSch in all categories than other estimates. Bias Score (BIAS - Figure 2c) 
shows similar values for all estimations (close to one, the ideal value), while ETS (Figure 
2d) show an improvement for all rainfall thresholds when compared with 3B42RT and 
3BR2V6. 

5. Summary and conclusions 

A comprehensive assessment has been performed to examine the performance of a new 
methodology (CoSch) to merge satellite estimations and daily gauge data over South 
America during 2004. For comparison purposes, 3B42RT (control algorithm) and 
3B42V6 (which also include monthly gauge data form GPCC), where also included in 



this analysis. Two intermediate results (ADD and RAT) used in the combined scheme, 
were also examined in order to determine how the proposed methodology works. 

Inter-comparisons and cross-validations tests have been carried out for the control 
algorithm and the different merging schemes over South American region during 2004, 
for different months belonging to different seasons and for different network densities. 
The results can be summarized as follows: 

• The election of the bias-removal technique seems to be related with the rainfall 
regime: where the additive bias correction scheme is selected above the mean value 
when the rainfall rate is lower and the inverse case occurs with the ratio-based 
scheme. 

• It can be shown, looking at the RMSE and Correlation coefficient that the combined 
scheme (CoSch) performs better than ADD and RAT separately, suggesting that an 
extra value is added when the proposed scheme is used. CoSch also show the best 
results of all analyzed merged schemes. 

• The control algorithm (3B42RT) presents the poorest performance. This result is 
expected because this algorithm doesn’t use any gauge data, while 3B42V6, which 
includes just only the GPCC monthly data, tends to improve all statistic parameters 
when compared with 3B42RT using an independent gauge dataset to validate, 

• In term of the performance for different rainfall thresholds, CoSch again shows the 
best performance when compared with other merging techniques and the control run. 

• Quality of the rainfall estimation degrades as the gauge network used became sparser. 
Nevertheless, the result is still better than those based on monthly gauge data, in 



particular due to CoSh being applied directly to the daily timescales rather than 
dissagregated from monthly mean precipitation. 

Based on these results, future work will be focused in the evaluation of this technique 
under different rainfall regimes and different region of the world. This experience could 
be replicated using different control algorithms (i.e. CMORPH) in order to get provide to 
the scientific community a suite of high-resolution, high-quality gauge-satellite based 
analyses of daily precipitation over land in global and regional domains. Nonetheless, 
due to the high resolution precipitation datasets constrained by daily observations, what is 
suitable for land surface and weather application, this technique has been identified as 
one of the best candidates for precipitation data forcing production for the South 
American LDAS over the entire continent. 
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